Goto

Collaborating Authors

 data readiness


Data Readiness for Scientific AI at Scale

Brewer, Wesley, Widener, Patrick, Anantharaj, Valentine, Wang, Feiyi, Beck, Tom, Shankar, Arjun, Oral, Sarp

arXiv.org Artificial Intelligence

This paper examines how Data Readiness for AI (DRAI) principles apply to leadership-scale scientific datasets used to train foundation models. We analyze archetypal workflows across four representative domains - climate, nuclear fusion, bio/health, and materials - to identify common preprocessing patterns and domain-specific constraints. We introduce a two-dimensional readiness framework composed of Data Readiness Levels (raw to AI-ready) and Data Processing Stages (ingest to shard), both tailored to high performance computing (HPC) environments. This framework outlines key challenges in transforming scientific data for scalable AI training, emphasizing transformer-based generative models. Together, these dimensions form a conceptual maturity matrix that characterizes scientific data readiness and guides infrastructure development toward standardized, cross-domain support for scalable and reproducible AI for science.


Exploratory Visual Analysis for Increasing Data Readiness in Artificial Intelligence Projects

Tiger, Mattias, Jakobsson, Daniel, Ynnerman, Anders, Heintz, Fredrik, Jönsson, Daniel

arXiv.org Artificial Intelligence

We present experiences and lessons learned from increasing data readiness of heterogeneous data for artificial intelligence projects using visual analysis methods. Increasing the data readiness level involves understanding both the data as well as the context in which it is used, which are challenges well suitable to visual analysis. For this purpose, we contribute a mapping between data readiness aspects and visual analysis techniques suitable for different data types. We use the defined mapping to increase data readiness levels in use cases involving time-varying data, including numerical, categorical, and text. In addition to the mapping, we extend the data readiness concept to better take aspects of the task and solution into account and explicitly address distribution shifts during data collection time. We report on our experiences in using the presented visual analysis techniques to aid future artificial intelligence projects in raising the data readiness level.


AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI

Hiniduma, Kaveen, Byna, Suren, Bez, Jean Luca, Madduri, Ravi

arXiv.org Artificial Intelligence

"Garbage In Garbage Out" is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists who use AI invest a considerable amount of time and effort in preparing the data for AI. However, there are no standard methods or frameworks for assessing the "readiness" of data for AI. To provide a quantifiable assessment of the readiness of data for AI processes, we define parameters of AI data readiness and introduce AIDRIN (AI Data Readiness Inspector). AIDRIN is a framework covering a broad range of readiness dimensions available in the literature that aid in evaluating the readiness of data quantitatively and qualitatively. AIDRIN uses metrics in traditional data quality assessment such as completeness, outliers, and duplicates for data evaluation. Furthermore, AIDRIN uses metrics specific to assess data for AI, such as feature importance, feature correlations, class imbalance, fairness, privacy, and FAIR (Findability, Accessibility, Interoperability, and Reusability) principle compliance. AIDRIN provides visualizations and reports to assist data scientists in further investigating the readiness of data. The AIDRIN framework enhances the efficiency of the machine learning pipeline to make informed decisions on data readiness for AI applications.


Data Readiness for AI: A 360-Degree Survey

Hiniduma, Kaveen, Byna, Suren, Bez, Jean Luca

arXiv.org Artificial Intelligence

Data are the critical fuel for Artificial Intelligence (AI) models. Poor quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Checking for data readiness is a crucial step in improving data quality. Numerous R&D efforts have been spent on improving data quality. However, standardized metrics for evaluating data readiness for use in AI training are still evolving. In this study, we perform a comprehensive survey of metrics used for verifying AI's data readiness. This survey examines more than 120 papers that are published by ACM Digital Library, IEEE Xplore, other reputable journals, and articles published on the web by prominent AI experts. This survey aims to propose a taxonomy of data readiness for AI (DRAI) metrics for structured and unstructured datasets. We anticipate that this taxonomy can lead to new standards for DRAI metrics that would be used for enhancing the quality and accuracy of AI training and inference.


Black Cape Awarded JAIC Basic Ordering Agreement for AI Data Readiness

#artificialintelligence

ARLINGTON, Va., June 07, 2022 (GLOBE NEWSWIRE) -- Black Cape, Inc., an Arlington, Virginia-headquartered dual-use technology company, has been awarded a spot on the Joint Artificial Intelligence Center (JAIC) Data Readiness for Artificial Intelligence Development (DRAID) Basic Ordering Agreement (BOA). The DRAID Program is a potential five-year, $241.6 million award focused on enabling the Department of Defense (DoD) to optimize its vast data resources to leverage AI to enhance its mission effectiveness. The multi-award BOA includes a range of tasks needed to create, acquire, curate, prepare, and manage data for use in DOD artificial intelligence and machine learning models and application development, all areas where Black Cape maintains extensive experience in the national security and defense space. "We are honored to have been selected for this important effort to bring artificial intelligence enabled tools and applications to the JAIC," said Al Di Leonardo, Co-Founder and CEO of Black Cape. Black Cape technologies are used across Government, the Intelligence Community, DoD, and US Special Operations Command (SOCOM) to provide analytic services, artificial intelligence and machine learning capabilities.


We Need to Talk About Data: The Importance of Data Readiness in Natural Language Processing

Olsson, Fredrik, Sahlgren, Magnus

arXiv.org Artificial Intelligence

In this paper, we identify the state of data as being an important reason for failure in applied Natural Language Processing (NLP) projects. We argue that there is a gap between academic research in NLP and its application to problems outside academia, and that this gap is rooted in poor mutual understanding between academic researchers and their non-academic peers who seek to apply research results to their operations. To foster transfer of research results from academia to non-academic settings, and the corresponding influx of requirements back to academia, we propose a method for improving the communication between researchers and external stakeholders regarding the accessibility, validity, and utility of data based on Data Readiness Levels \cite{lawrence2017data}. While still in its infancy, the method has been iterated on and applied in multiple innovation and research projects carried out with stakeholders in both the private and public sectors. Finally, we invite researchers and practitioners to share their experiences, and thus contributing to a body of work aimed at raising awareness of the importance of data readiness for NLP.


AWS Offers Course on Basics of Machine Learning - InformationWeek

#artificialintelligence

On one hand, organizations recognize the potential value of machine learning to scale operations, gain faster and deeper insights, respond to quickly changing conditions, and more. On the other hand, it's hard to get started on something that is novel to your organization. You may not have the talent in-house, and you don't have any experience. What's more, even for those organizations that have run successful pilots, many have struggled to move those pilots into production for a variety of reasons. It feels like many organizations are stuck.


Taking Matters into Your Own Hands

#artificialintelligence

See also the article by Pan et al in this issue. Safwan S. Halabi, MD, is a clinical associate professor of radiology at the Stanford University School of Medicine and serves as the medical director for radiology informatics at Stanford Children's Health. Dr Halabi's clinical and administrative leadership roles are directed at improving quality of care, efficiency, and patient safety. His current academic and research interests include imaging informatics, deep/machine learning in imaging, artificial intelligence in medicine, clinical decision support, and patient-centric health care delivery. Bone age assessment became an early AI "poster child" that demonstrated the power of applying regression and machine learning techniques to a mundane and monotonous radiologic diagnostic task.


Top firms to double number of AI projects by 2020: Gartner

#artificialintelligence

Organisations that are working with artificial intelligence (AI) or machine learning (ML) have, on average, four projects utilising these technologies in place, according to a recent survey by Gartner. The survey finds 59 per cent of respondents have deployed AI. These respondents expect to add six more projects in the next 12 months, and another 15 within the next three years. This means that in 2022, those organisations expect to have an average of 35 AI or ML projects in place, says Gartner, in its AI and ML Development Strategies study . The analyst firm says the study is based on the results of a survey it conducted in December 2018 with 106 Gartner Research Circle members. The latter is a Gartner-managed panel composed of IT and IT/business professionals.


Ready. Set. Go! Data Readiness for Artificial Intelligence (AI) GovLoop

#artificialintelligence

Where does your organization stand? This is the second blog in a four-part series detailing the components necessary for AI success. You can read my earlier post about cultural willingness, which must be prioritized ahead of data and infrastructure readiness (this blog), workforce skilling, and plans for ethics, risk and compliance. Combining the computational power of artificial intelligence (AI) with the critical thinking ability of humans is the ideal solution for organizations looking to accelerate the discovery of actionable insights from their data assets. Even with the human expert in the loop, to achieve valid results with as little bias as possible, AI relies on large volumes of historical data and sophisticated mathematics to generate insights.